-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added line cache #678
added line cache #678
Conversation
ttcn3/v2/syntax/syntax.go
Outdated
if t.cachedLineBegin <= pos && pos < t.cachedLineEnd { | ||
return Position{ | ||
Line: t.cacheLine + 1, | ||
Column: pos - t.lines[t.cacheLine] + 1, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache-handling is distributed over two methods (tree.position
and tree.searchLines
). I suggest you put all cache-related code into only one function (either in tree.position
or in tree.searchLines
, but not both). This increases cohesion and make the code easier to reason about.
ttcn3/v2/syntax/syntax.go
Outdated
t.cachedLineBegin = i | ||
t.cachedLineEnd = j |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those two lines are the reason why the benchmarks are not as good as expected and why the tests fail:
i
and j
represent lines, while cachedLineBegin
and cachedLineEnd
are supposed to represent (file) offsets. In line 288 you're essentially comparing lines with offsets:
if t.cachedLineBegin <= pos && pos < t.cachedLineEnd
I think those two variables should be set after you found the proper line:
t.cachedLineBegin = i | |
t.cachedLineEnd = j |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have now put the comparison steps in searchLines function as below so searchLines function will return cacheLine
if t.cachedLineBegin <= i && j < t.cachedLineEnd {
return t.cacheLine
}
is this correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost. Instead of i
and j
you should compare with pos
:
if t.cachedLineBegin <= pos && pos < t.cachedLineEnd {
return t.cacheLine
}
I'd like to explain a bit how location handling works, this will clear things up:
Overview
One core functionality of this packages is to provide start and end position for syntax nodes. We call this a text span and it includes a file name, line numbers, column numbers and the file offsets. We could store this information inside the node object. For example:
type Node struct {
// ...
Span struct {
Filename string
Begin, End struct {
Line, Column, Offset int
}
}
}
This approach is strait-forward and easy, but also quite wasteful. It would add 64 bytes of repetitive data to every node and sometimes we need this detailed information only for a few nodes.
So, how can we improve this situation?
Instead of storing the whole span, we store the begin and end offsets (8 bytes instead of 64) and provide an algorithm to compute the detailed information only when needed. For example the node would look like this:
type Node struct {
// ...
Begin, End int32
}
How do we get from offset to line numbers?
When creating the syntax tree, we keep track of newline characters (\n
) and store the start-offset for every line in a slice.
Example: Lines slice
The input Foo\nBar Baz\nFoobar\n
contains three lines, the first line starts is at offset 0, the second line starts at offset 4, the last line starts at offset 12. The resulting lines slice would like this:
lines := []int{0, 4, 12}
Example: Lookup
In which line is the word Baz
? We know from the syntax nodes, that the word begins at offset 8.
Now we search in the lines slice to find out to which line the offset 8 belongs to:
for i := range lines {
if 8 < lines[i] {
return i-1
}
}
return len(lines)-1
We need to subtract 1 from the found index, because this loop exits at one line after the one we were actually searching for; in this case it breaks at the third line (i==2
).
Binary Search
Our files have 300-1500 lines on average. As a consequence the described lookup becomes very inefficient, because it's just a linear search. We can do much better if we use a binary search algorithm instead.
Here are some articles I found explaining binary search:
The tree.searchLines
implements such a binary search. The variables i
and j
are the lower- and the higher limits required for the binary search. That's why i
starts at 0 and walks towards the end of the lines slice, while j
starts at the end (len(t.lines)
) and walks towards the beginning of the lines slice.
Caching
The caching is independent of those binary search variables. For caching we need to check if the pos
offset is on the same line as the last time searchLines
was called. We could do it this way:
if t.cachedLineBegin <= pos && pos < t.cachedLineEnd {
return t.cacheLine
}
I just realized you don't need store the offsets in cachedLineBegin
and cachedLineEnd
, but you can do the check directly in the lines-slice:
if len(t.lines) > 1 && t.lines[t.cachedLine] <= pos && pos < t.lines[t.cachedLine+1] {
return t.cachedLine
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for detailed explanation.
ttcn3/v2/syntax/syntax.go
Outdated
@@ -306,6 +314,7 @@ func (t *tree) searchLines(pos int) int { | |||
j = h | |||
} | |||
} | |||
t.cacheLine = int(i) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.cacheLine = int(i) - 1 | |
line := int(i) - 1 | |
t.cachedLineBegin = t.lines[line] | |
t.cachedLineEnd = t.lines[line+1] | |
t.cacheLine = line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have tried this but for line 319 its giving me runtime error: index out of range [1] with length 1
as line array contains only one element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. t.lines[lint+1]
does not work for the last line.
Maybe we could fix it with a condition? No very elegant, I know:
if line < len(t.lines) {
t.cachedLineEnd = t.lines[line+1]
} else {
t.cachedLineEnd = len(t.lines)
}
No description provided.